Enhanced Hierarchical Clustering for Genome Databases

نویسندگان

  • Sadiq Hussain
  • Gopal Hazarika
چکیده

Clustering techniques find interesting and previously unknown patterns in large scale data embedded in a large multi dimensional space and are applied to a wide variety of problems like customer segmentation, Biology, data mining techniques, machine Learning and geographical information systems. Clustering algorithms are used efficiently to scale up with the dimensionality of the data sets and the data base size. Hierarchical clustering methods in particular are widely used to find patterns in multi dimensional data. In this paper, we design an enhanced hierarchical clustering algorithm which scans the dataset and calculates distance matrix only once. Our main contribution is to reduce time, even when a l arge database is analyzed. Also, the results of hierarchical clustering are represented as a b inary tree which gives clarity in grouping and further helps to find clustered objects easily. Our algorithm is able to retrieve number of clusters with the help of cut distance and measures the quality with validation index in order to obtain the best one; does not require initial parameter like number of clusters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

ROCKET: A Robust Parallel Algorithm for Clustering Large-Scale Transaction Databases

We propose a robust and efficient algorithm called ROCKET for clustering large-scale transaction databases. ROCKET is a divisive hierarchical algorithm that makes the most of recent hardware architecture. ROCKET handles the cases with the small and the large number of similar transaction pairs separately and efficiently. Through experiments, we show that ROCKET achieves high-quality clustering ...

متن کامل

Grid-based Clustering in the Content-based Organization of Large Image Databases

In the image databases, there is often a need to organize the images automatically by their content. For the content-based organization of the images, clustering operations can be applied. We present a new, efficient method for the clustering of large image databases. The method is based on hierarchical clustering of the image database using grid. In this paper, the grid-based clustering method...

متن کامل

The New Software Package for Dynamic Hierarchical Clustering for Circles Types of Shapes

In data mining, efforts have focused on finding methods for efficient and effective cluster analysis in large databases. Active themes of research focus on the scalability of clustering methods, the effectiveness of methods for clustering complex shapes and types of data, high-dimensional clustering techniques, and methods for clustering mixed numerical and categorical data in large databases. ...

متن کامل

iDEP: An integrated web application for differential expression and pathway analysis

iDEP (integrated Differential Expression and Pathway analysis) is a web application that reads in gene expression data from DNA microarray or RNA-Seq and performs exploratory data analysis (EDA), differential expression, and pathway analysis. The key idea of iDEP is to make many powerful R/Bioconductor packages easily accessible by wrapping them under a graphical interface, alongside annotation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011